LWS-Det: Layer-Wise Search for 1-bit Detectors

169



















































 

 



FIGURE 6.11

An illustration of binarization error in the 3-dimension space. (a) The intersection angle θ

of real-valued weight w and activation a is significant. (b) After binarization ( ˆw, ˆa) based

on sign function, the intersection angle ˆθ = 0 . (c) ˆθ = 0 based on XNOR-Net binarization.

(d) Ideal binarization via angular and amplitude error minimization.

illustrated in Fig. 6.10. As depicted above, the main learning objective (layer-wise binariza-

tion error) is defined as

E =

N



i=1

ai1wiai1wiαi2

2,

(6.69)

where N is the number of binarized layers. We then optimize E layer-wise as

argmin

wi,αi

Ei(wi, αi; wi, ai1, ai1),

i[1, N].

(6.70)

In LWS-Det, we learn Equ. 6.70 by decoupling it into angular loss and amplitude loss, where

we optimize the angular loss by differentiable binarization search (DBS) and the amplitude

loss by learning the scale factor.

6.4.3

Differentiable Binarization Search for the 1-Bit Weight

We formulate the binarization task as a differentiable search problem. Considering that the

1-bit weight is closely related to the angular, as shown in Fig. 6.11, we define an angular

loss to supervise our search process as

LAng

i

=cosθicosθi2

2

=

ai1wi

ai12wi2

ai1wi

ˆai12wi2

2

2.

(6.71)

For the learning process of the i-th layer, the objective is formulated as

argmin

wi

LAng

i

(wi; ai, wi, ai).

(6.72)